Cache Mapping in Challenge/Onyx

Cache Mapping in Challenge/Onyx

The cache design in the Challenge/Onyx line depends on the CPU model in use. The basic Challenge/Onyx uses the IP19 with an R4000 processor. This CPU board uses a simple algorithm to assign a memory location to a cache line: the address of a byte of data is taken modulo the cache size to generate the cache address. This means that two words that are separated in main memory by an exact multiple of the cache size are always loaded to the same cache location.

Only one of the words can occupy the cache at a time, so if your program alternates between words, it will have a cache miss on each reference. It is surprisingly easy to create this situation. The following code fragment causes bad performance in a Challenge/Onyx with a 1 MB cache.

float part1[262144]; /* 1 MB */
float part2[262144]; /* adjacent 1 MB */
for (j=0;j<262144;++j) part1[j] = part2[j];

In that code fragment, the words of each array hash to the identical cache lines, so each assignment in the loop incurs two cache misses. (Some Challenge/Onyx systems have caches of different sizes, but the same principle applies.)

Note: The cache in the R8000-based POWER Challenge does not use simple modulus mapping; it is an associative memory that is much more resistant to cache conflicts.